UDTFs in RPG for the Win!

I’m a firm believer in separating business rules from the user interface. A more robust, modern system starts with the ability to write the business rules as “services” that are callable from any user interface or batch process and that you can test independently from the rest of the application. One of my favorite tools to accomplish this is a User Defined Table Function (UDTF)—a function that implements a virtual “table” (or physical file, as we’ve traditionally called it) that you can query from an SQL SELECT statement. IBM i has supported UDTFs since 5.3.

All database software that offers UDTFs lets you write them in SQL, of course. But DB2 for i is unique because it lets you write UDTFs in any high-level language, including RPG! So you can write your business logic in a UDTF and write the UDTF in RPG, letting you take advantage of your company’s existing RPG code and your staff’s existing RPG skills. This article shows you how.

When Are UDTFs Appropriate?

UDTFs are an appropriate tool anytime you want to write business logic to calculate data to display in a “list” or “table.” To put it another way, they’re great for the sort of data we’d traditionally load into a subfile or print on a report. When writing a new report, I write my business logic as a UDTF. Then, I write a rather simple RPG program that reads the data from the UDTF and prints it. A few months later, I’m bound to have a user request for the report to display on the screen instead. When that happens, I write another small program that reads the same UDTF but outputs it on the screen. Oftentimes, a user will want the report emailed in Excel format, displayed in an HTML page, or made available through a web service that a mobile device can call. I can do all of this the same way I wrote the original report simply by calling the existing UDTF and outputting the data to a different medium. It works great!

A UDTF Is an SQL Function

Because a UDTF is an SQL function, it’s called from an SQL statement, where it takes the place of a table. To give you an idea of how to call one, compare the code to read data from a traditional database table (Figure 1) with the code to call a UDTF (Figure 2). As you can see, the UDTF is called in the FROM clause of a SELECT statement. You must code the function name inside the parenthesis that follows the word TABLE. The part that says “as XYZ” (a.k.a. a “correlation clause”) is required. The basic example in Figure 2 has no bells or whistles, but if you’re already familiar with SQL SELECT statements, it should suffice to get you started. Rest assured that you can use all the functionality of the SELECT statement on the output of the table function. For example, you can use the ORDER BY clause to sort data emitted from the UDTF or employ a WHERE clause to search the UDTF’s output for a particular string. You can even use aggregate functions such as SUM or AVG to get statistics on the columns. One of the most compelling reasons to write your RPG code as a UDTF is that doing so makes it available to call from many places. Think about this: Any software that can run an SQL statement against your system can run your UDTF! Not only can you call it from an RPG program, but you can also call it from Java, PHP, or .NET with ease because these environments can also run SQL statements against a DB2 for i database.

Figure 1: A traditional SQL SELECT statement

select col1, col2, col3  from mytable as XYZ

Figure 2: An SQL SELECT statement that reads from a UDTF

select col1, col2, col3 from table(myUDTF(arg1, arg2)) as XYZ

How to Write the RPG Code for a UDTF

The database software built into the OS will call your program repeatedly in a loop, similarly to the way a traditional RPG program may read a file in a loop. In a traditional RPG program, you can use the OPEN opcode to open a file, then code a loop that reads that file until the %EOF BIF tells you that you’ve reached the end of the file, and finally call the CLOSE opcode to close the file.

A UDTF follows this same pattern except that the roles are reversed. Remember that in a UDTF, your code pretends to be a file. The OS determines when to open or close that file, and the loop to read the records resides somewhere inside the OS rather than in your program. Because your program is acting as the file, it’s responsible for notifying the system when an error occurs or when the end of the file has been reached.

How does that work? The OS calls your program repeatedly and passes a parameter known as the “call type.” SQL will set this parameter to a value of -1 to indicate “open,” 0 to specify “fetch,” or 1 to denote “close.” It also passes you an SQL state and error message parameter that you use to notify the system of errors. Setting the SQL state to 02000 tells the system that it has reached the end of your file.

The diagram in Figure 3 shows the flow of events for a UDTF. First, SQL calls your program with open, then calls it repeatedly with fetch until you return an SQL state of 02000, and finally it calls the program with close.

Figure 3: The flow of events in a UDTF

To communicate between your program and the OS, you use parameters. Figure 4 shows the parameters required for a UDTF to work. SQL can call your RPG code either as a program (*PGM object) or as a subprocedure in a *SRVPGM object. I recommend using D-specs to define your parameters in a procedure interface (PI) definition; this approach works nicely for both call types. I’ll explain how to describe your program to SQL in a moment, but first let’s look an example of the RPG code.

Figure 4: Parameters that SQL passes to RPG in a UDTF

A:   D CustInvoices    pi
     D    custno                      4p 0 const
     D    daysback                    3p 0 const
B:   D    inv_no                      5a
     D    inv_date                     D   datfmt(*iso)
     D    inv_wgt                     9p 1
     D    inv_amt                     9p 2
     D    pay_sts                     1A
C:   D    n_custno                    5i 0 const
     D    n_daysback                  5i 0 const
D:   D    n_inv_no                    5i 0
     D    n_inv_date                  5i 0
     D    n_inv_wgt                   5i 0
     D    n_inv_amt                   5i 0
     D    n_pay_sts                   5i 0
E:   D    State_SQL                   5a
F:   D    Function                  517a   varying const
     D    Specific                  128a   varying const
G:   D    errorMsg                   70a   varying
H:   D    CallType                   10i 0 const

An Example of a UDTF in RPG Code

Suppose I’m tasked with writing a report that shows all of a customer’s invoices from the past 30 days. For each invoice returned, I want to display the invoice number, date, weight, amount, and payment status. In my shop, this is more than a simple query. To calculate the total amount of an invoice, I must add up the line items and then apply any discounts or charges added at the bottom line. To figure out the weight, I need to determine the quantity that we shipped and multiply it by the item weight per unit. To ascertain whether an invoice was paid, I have to check for that invoice in a separate “cash file” that contains customer payments. After considering this for a moment, I decide that the best design for this program is to write a UDTF that accepts a customer number and a number of days as input parameters. The UDTF will do all the business logic to calculate the invoice list and will return it as a “virtual table” to the calling SQL statement.

Figure 4 shows the parameter list for this CustInvoices RPG routine. A parameter list for a UDTF always consists of one parameter for each input argument (at A in Figure 4), one parameter for each column (or “field”) in each record (at B), one integer to act as a null indicator for each input argument (at C), one integer to act as a null indicator for each output column in the record (at D), the SQL state (at E), the fully qualified function name and specific name (at F), the message text (at G), and the call type (at H).

The input arguments and their corresponding null indicators are the values passed from the SELECT statement to the database engine in the OS and subsequently passed from the OS to your RPG code. The input parameters are always populated; it doesn’t matter whether the call type is open, fetch, or close.

You return the column (or “field”) values of each row (or “record”) in the output column parameters and their corresponding null indicators. For each call with a call type of fetch, you should return one row. The OS ignores the values in these parameters if the call type is not fetch or if the SQL state indicates end of file or error.

The SQL State and message text are output-only parameters that inform the OS when my virtual file reaches its end or when an error occurs. Call type is an input-only parameter that the OS uses to tell me which operation it’s requesting. The call type -1 indicates an OS request for an open operation, 0 to fetch a record, and 1 to close my virtual file.

Figure 5 shows how I structure the mainline of my UDTF code. I like to define named constants for the values I expect to receive in my call type parameter and my null indicators (at A in Figure 5). My business rules do not allow for null custno or daysback parameters, so if a null is passed, I use the SQL state and message text to report an error to the caller (at B). Finally, I run a different routine depending on whether the OS asks me to open, fetch a row, or close my virtual file (at C). Thankfully, when I received this requirement, I had existing RPG code that did some of the things I needed to do. I had one existing routine in another program that calculated an invoice amount. A separate routine determined the weight, and a third checked whether an invoice had been paid.

Figure 5: The "mainline" of the CustInvoices RPG Code

A:   D CALL_OPEN       C                   CONST(-1)
     D CALL_FETCH      C                   CONST(0)
     D CALL_CLOSE      C                   CONST(1)

     D PARM_NULL       C                   CONST(-1)
     D PARM_NOTNULL    C                   CONST(0)

      /free

B:      if n_custno = PARM_NULL
           or n_daysback = PARM_NULL;
           State_SQL = '38999';
           errorMsg = 'Null parameters not allowed!';
        endif;

C:      select;
        when CallType = CALL_OPEN;
           exsr doOpen;
        when CallType = CALL_FETCH;
           exsr doFetch;
        when CallType = CALL_CLOSE;
           exsr doClose;
        endsl;

        return;

One benefit of writing a UDTF in RPG is that I can leverage our existing code, so I don’t have to start from scratch, and that saves me time and money. Instead, I can copy the existing code into my program and modify it to make it work as a UDTF. The existing routines that I copied were written using RPG’s native I/O routines, so I used them in my UDTF as well. However, using embedded SQL to retrieve the invoice data instead of using native I/O would have worked just as nicely.

Figure 6 shows the routine that I run when the OS requests that I open my virtual table. Here, I open the files that I plan to use for native I/O (at A in Figure 6). Though I did use the routine to open files in this example, I don’t want to give you the impression that this is all that I can use it for. With the open routine, I can run any RPG code that I’d like to run once at the start of my function. Think of it as the *INZSR of a UDTF. In fact, you’ll notice that I take this opportunity to calculate the starting date of invoices to retrieve and do an SETLL on my INVCSDT (invoices by customer and date) logical file.

Figure 6. Routine run when the OS wants to open the table

    begsr doOpen;

A:     open INVCSDT;
       open ARCASH;
       open INVDET;

       startDate = %date() - %days(daysback);

       setll (CustNo: StartDate) INVCSDT;
    endsr;

When the OS wants to fetch a record, I run the routine in Figure 7. In this routine, I read the next record (by customer) from my INVCSDT file. When I reach the end of that file, there are no more invoices to retrieve for this customer, so I set the SQL state to 02000 to indicate end of file (at A in Figure 7). The remainder of the fetch routine is responsible for populating the columns in the row that I’ll return to the OS. I’ve omitted the code that calculates the invoice amount and weight to keep the figure short and simple, but you can download the complete code in all of its glory by clicking this link. Suffice it to say that this is an RPG routine, so any of the code you’d write in a typical RPG program, you can use here.

Figure 7. Routine run when the OS wants to fetch each row

   begsr doFetch;


A:     reade (CustNo) INVCSDT INV;
       if %eof(INVCSDT);
          State_SQL = '02000';
          leavesr;
       endif;
        .
        .
      // Business logic to calculate invoice amount (Total), 
      // Total Weight (TotWgt) and payment status (Status)
      // goes here.
        .
        .
       n_inv_no   = PARM_NOTNULL;
       n_inv_date = PARM_NOTNULL;
       n_inv_wgt  = PARM_NOTNULL;
       n_inv_amt  = PARM_NOTNULL;
       n_pay_sts  = PARM_NOTNULL;

       inv_no   = INV.INVNO;
       inv_date = INV.INVDAT;
       inv_wgt  = TotWgt;
       inv_amt  = Total;
       pay_sts  = Status;

    endsr;

Figure 8 shows the logic I use when the OS asks me to close my files. Here, I simply close the files I had opened for native I/O. You can use this routine to run any RPG code that you’d like to run once at the end of your function.

Figure 8. Routine run when the OS wants to close the table

    begsr doClose;
       close *all;
    endsr;

In the CustInvoices example, I’ve written my RPG code as a service program containing a single exported subprocedure. I compile my RPG code using the CRTRPGMOD (PDM option 15) and CRTSRVPGM command, just as I would any other RPG service program. So far, that’s all it is, an ordinary service program—but the next step is to turn it into an SQL function.

Turning the RPG Code into an SQL Function

You create SQL functions by running the Create Function SQL statement. When I run Create Function, SQL stores information about my routine, including the name of the service program, the name of the subprocedure, the language that I write it in (which tells SQL how to format the parameters), and more.

Figure 9 shows the Create Function statement for my CustInvoices example. At A in Figure 9, I define the input parameters to pass to my function when it’s called from the SELECT statement. These parameters correspond directly to the input arguments I defined in my RPG code (A in Figure 4). Next, I tell the OS that my function will return a virtual table, and I define the columns that my table should contain (B in Figure 9). These correspond directly to the output columns in my RPG code (B in Figure 4). I don’t need to define the other parameters in my RPG code because the OS automatically adds them as part of the DB2SQL parameter style. DB2SQL is the only parameter style allowed for UDTFs.

Figure 9. Setting Up the Function Call in SQL

Create Function CustInvoices
A:     ( 
          custno   Decimal(4, 0),
          daysback Decimal(3, 0) 
       )
B:     Returns Table
       (
          inv_no     char(5),
          inv_date   date,
          inv_wgt    decimal(9, 1),
          inv_amt    decimal(9, 2),
          pay_sts    char(1)
       )
C:     external name 'LIBSCK/CUSTINV(CUSTINVOICES)'
       program type sub
D:     language rpgle
E:     parameter style db2sql
F:     no sql
G:     not deterministic
       external action
H:     disallow parallel;

External name and program type sub (at C in Figure 9) give the OS the name of the library, service program, and subprocedure to call. The subprocedure name is in parentheses and is case-sensitive. Type “sub” means that it’s a subprocedure call. To specify a call to a program call instead, remove the subprocedure name from the external program clause and change the program type to “main.”

At D, language rpgle informs the system that my program is written in ILE RPG. This instructs SQL on how to format any variables whose definition may vary from one language to another. For example, in ILE RPG, an SQL VARCHAR is passed with a two-byte length, followed by the character data, which matches the format of an RPG VARYING field. However, if I specified C as the language, language C would pass the data as a zero-terminated string because that’s how C programs typically work with strings. Parameter style db2sql (at E) tells SQL which parameters to pass to the function. This is how it knows to pass null indicators, function names, SQL state, message text, and call type. For UDTFs, DB2SQL is the only allowable parameter style. Other functions (that do not return a table) can use additional parameter styles, depending on how you write them.

At F, No SQL lets SQL know what sort of SQL code is in the RPG routine that it’s calling. In this example, my RPG code does not use SQL, so I coded No SQL. I could also code contains sql, reads sql data, or modifies sql data, depending on how my program uses SQL. If you think about it, your UDTF runs as part of an SQL statement because that’s what a UDTF is! So running SQL statements in your RPG code is like running an SQL statement inside another SQL statement. It’s possible to do that but requires a bit of extra work from the database engine’s viewpoint. If you specify No SQL, the database can avoid doing that extra work. Contains sql lets you run a few SQL statements (such as SET statements), but it runs slightly slower than No SQL. Reads SQL data is slower yet but lets you read data and fetch records, which is quite useful. Modifies SQL data, the slowest of all, lets nearly any SQL statement run. I have experimented with this option and have found the performance penalties tiny, and my users don’t notice the difference in performance. So if my program must modify SQL data, I don’t hesitate to code it that way. But when I have code that doesn’t use SQL (such as CustInvoices), I may as well take advantage of any extra performance by specifying No SQL.

Not Deterministic and External Action (at G) inform SQL whether your code will always return the same result given the same input parameters. If so, then we consider the code deterministic. This technique is useful because the system can cache your output and, therefore, avoid calling your routine a second time if the input data hasn’t changed; instead, it can simply use the cached copy. On the other hand, not deterministic means that the same input may cause different results, so SQL can’t rely on cached results. External Action is similar. A function that takes no external action may not need to be called each time but could use a cached result. However, a function that has an external action must not rely on cached data because it implies that the program takes additional action outside the SQL statement. In this case, the system won’t use the cache but will call the function as needed.

Disallow Parallel (at H) tells SQL that it must wait for the first invocation of your function to end before calling it again. If you wrote your program to handle multiple concurrent threads, then you can code allow parallel so that SQL can call your RPG code in multiple threads. I’ve always used Disallow Parallel for this keyword.

The Create Function SQL statement creates a function, but it does not actually create a disk object. Instead, SQL stores information about how to call your RPG code in its system catalog tables. Tables such as SYSFUNCS and SYSROUTINES are updated with information on how your function works. If you want to remove this function definition, you can do so with the DROP FUNCTION SQL statement.

Typically, the Create Function SQL statement needs to run only once per function that you create. The database will remember the functiondefinition—so you don’t need to run Create Function again later. Even if you modify your RPG code, you need not call Create Function unless the parameters change. If that case, drop the only function with the DROP FUNCTION SQL statement and re-create it with the new parameters.

Testing the UDTF

Now that I’ve created my UDTF, I test it by calling it though an SQL statement. Figure 10 shows a simple SQL statement that I can run to call my UDTF to see what output it provides. You can run this statement from any SQL interface, but for testing, I find the Run SQL Scripts tool in IBM i Navigator the most convenient. When I need to debug my RPG code, I can use the Run/Debugger option in Run SQL Scripts to start a GUI debugger on my service program, step through the code as it runs, and set breakpoints. Once you’ve tested your UDTF, you’ll want to call it from your programs. You call a UDTF from the embedded SQL interface in RPG by coding it on an EXEC SQL statement, just as you would with any other SQL SELECT statement.

Figure 10. Calling the CustInvoices UDTF

  select * from table(CustInvoices(5724, 30)) as t

UDTFs Provide Versatility

The best part about UDTFs is the versatility they provide. Today’s requirement was to print a report showing a customer’s invoices for the past 30 days, but what will tomorrow’s be? A website? An Excel document that’s emailed to a sales rep? When users make these requests, I won’t have to rewrite my business logic. Instead, I’ll call the same UDTF again and write the output differently.

Want the data sequenced by the invoice amount? No problem—simply add an ORDER BY to sort it, as Figure 11 shows. Does accounting want only the unpaid invoices? I can add a WHERE clause, as in Figure 12. Need to know the average size of a customer’s invoice over the past year? I can do that by using the SQL AVG function (Figure 13). Would you find it useful to see which customers have more than $1,000 worth of unpaid invoices in the past 180 days? I can do that by putting my UDTF in a subselect (Figure 14). I can accomplish all of these tasks without rewriting my business logic because UDTFs are so versatile.

Figure 11. Use ORDER BY to Sort the Output

  select * 
    from table(CustInvoices(5724, 30)) as t
   order by inv_amt desc

Figure 12. Only Unpaid Invoices

  select * 
    from table(CustInvoices(5724, 30)) as t
   where pay_sts = 'N'

Figure 13. Average invoice amount over the past year

  select avg(t.inv_amt)
    from table(CustInvoices(5724, 365)) as t

Figure 14. UDTF used in a sophisticated sub-select

  select c.cucust,
          (select sum(t.inv_amt)
             from table(CustInvoices(c.cucust,180)) as t
            where t.pay_sts='N') as TotalUnpaid
     from CUSTMAS as c
    where (select sum(t.inv_amt)
             from table(CustInvoices(c.cucust,180)) as t
            where t.pay_sts='N') > 1000

UDTFs for the Win

A popular phrase on the Internet these days is “for the win” (abbreviated FTW). When something is really cool and works well, people will say that thing is FTW. That’s how I feel about User Defined Table Functions—they’re really neat, fun to write, and incredibly useful. UDTFs FTW!